121 research outputs found

    PIQA: pipeline for Illumina G1 genome analyzer data quality assessment

    Get PDF
    Summary: PIQA is a quality analysis pipeline designed to examine genomic reads produced by Next Generation Sequencing technology (Illumina G1 Genome Analyzer). A short statistical summary, as well as tile-by-tile and cycle-by-cycle graphical representation of clusters density, quality scores and nucleotide frequencies allow easy identification of various technical problems including defective tiles, mistakes in sample/library preparations and abnormalities in the frequencies of appearance of sequenced genomic reads. PIQA is written in the R statistical programming language and is compatible with bustard, fastq and scarf Illumina G1 Genome Analyzer data formats

    Small RNA populations for two unrelated viruses exhibit different biases in strand polarity and proximity to terminal sequences in the insect host Homalodisca vitripennis

    Get PDF
    AbstractNext generation sequence analyses were used to assess virus-derived small RNA (vsRNA) profiles for Homalodisca coagulata virus-1 (HoCV-1), family Dicistroviridae, and Homalodisca vitripennis reovirus (HoVRV), family Reoviridae, from virus-infected H. vitripennis, the glassy-winged sharpshooter. The vsRNA reads were mapped against the monopartite genome of HoCV-1 and all 12 genome segments of HoVRV, and 21nt vsRNAs were most common. However, strikingly contrasting patterns for the HoCV-1 and HoVRV genomic RNAs were observed. The majority of HoCV-1 vsRNAs mapped to the genomic positive-strand RNA and, although minor hotspots were observed, vsRNAs mapped across the entire genomic RNA. In contrast, HoVRV vsRNAs mapped to both positive and negative-sense strands for all genome segments, but different genomic segments showed distinct hotspots. The HoVRV vsRNAs were more common for 5′ and 3′ regions of HoVRV regions of all segments. These data suggest that taxonomically different viruses in the same host offer different targets for RNA-antiviral defense

    Longitudinal Metagenomic Analysis of the Water and Soil from Gulf of Mexico Beaches Affected by the Deep Water Horizon Oil Spill

    Get PDF
    Estimates of 7x105 cubic meters of crude oil were released into the Gulf of Mexico as a consequence of the April 20th, 2010 Deep Water Horizon drilling rig explosion, leaving thousands of square miles of earth's surface covered in crude oil. Dispersants were used on large slicks and injected at the well head, resulting in oil being suspended throughout the water column. Starting in June 2010, oil reached hundreds of miles of Louisiana, Alabama, Mississippi, and Florida shoreline disturbing the ecological balance and economic stability of the region. While visible damages are evident in the wildlife populations and marine estuaries, the most significant affect may be on the most basic level of the ecosystems: the bacterial and plankton populations.We present results from high throughput DNA sequencing of close-to-shore water and beach soil samples before and during the appearance of oil in Louisiana and Mississippi. Sixteen samples were taken over a two month period at approximately two week intervals from Grand Isle, LA and Gulfport, MS and were sequenced using the Illumina GAIIx platform. Significant genomic-based population fluctuations were observed in the soil and water samples. These included large spikes in the human pathogen Vibrio cholera, a sharp increase in Rickettsiales sp., and decrease of Synechococus sp. in water samples. Analysis of the contiguous de-novo assembled DNAs (contigs) from the samples also suggested the loss of biodiversity in water samples by the time oil appeared at the shores in both locations. Our observations lead us to the conclusion that oil strongly influenced microbial population dynamics, had a striking impact on the phytoplankton and other flora present prior to the appearance of oil, and that the microbial community had not recovered to pre-spill conditions by the end of our observational period

    Genomic Epidemiology of Multidrug-Resistant Mycobacterium tuberculosis During Transcontinental Spread

    Get PDF
    The transcontinental spread of multidrug-resistant (MDR) tuberculosis is poorly characterized in molecular epidemiologic studies. We used genomic sequencing to understand the establishment and dispersion of MDR Mycobacterium tuberculosis within a group of immigrants to the United States. We used a genomic epidemiology approach to study a genotypically matched (by spoligotype, IS6110 restriction fragment length polymorphism, and mycobacterial interspersed repetitive units-variable number of tandem repeat signature) lineage 2/Beijing MDR strain implicated in an outbreak of tuberculosis among refugees in Thailand and consecutive cases within California. All 46 MDR M. tuberculosis genomes from both Thailand and California were highly related, with a median difference of 10 single-nucleotide polymorphisms (SNPs). The Wat Tham Krabok (WTK) strain is a new sequence type distinguished from all known Beijing strains by 55 SNPs and a genomic deletion (Rv1267c) associated with increased fitness. Sequence data revealed a highly prevalent MDR strain that included several closely related but distinct allelic variants within Thailand, rather than the occurrence of a single outbreak. In California, sequencing data supported multiple independent introductions of WTK with subsequent transmission and reactivation within the state, as well as a potential super spreader with a prolonged infectious period. Twenty-seven drug resistance-conferring mutations and 4 putative compensatory mutations were found within WTK strains. Genomic sequencing has substantial epidemiologic value in both low- and high-burden settings in understanding transmission chains of highly prevalent MDR strain

    The Oregon Promise Barley Population: A tool for understanding the genetic basis of traits fundamental for barley production, malting, brewing, and distilling

    Get PDF
    The simultaneous availability of unique germplasm resources and cost-effective high-throughput genotyping allows for accelerated genome exploration and gene discovery. Our germplasm -the Oregon Promise population- is an array of 200 barley doubled haploids developed from the cross of Full Pint x Golden Promise. The spring 2-row parents have contrasting alleles at two of the dwarfing genes deployed in current varieties. The four homozygous combinations of these plant height alleles lead to contrasting phenotypes and each allele has pleiotropic effects on a range of other traits. Golden Promise is an iconic variety for malting, brewing, and distilling; Full Pint is a contributor to the craft brew Renaissance. Accordingly, the Oregon Promise will provide a valuable resource for extending current knowledge of malting and brewing genes to the frontiers of sensory assessment. The population shows transgressive segregation for adult plant resistance to stripe rust. As this disease is likely to become increasingly prevalent as a consequence of climate change, expanding the catalog of genes conferring durable resistance to this pathogen is an essential defensive breeding step. The availability of a quick-turnaround and cost effective SNP genotyping service (400+ markers) at Eureka Genomics (developed in collaboration with the James Hutton Institute) allows accelerated linkage map construction, QTL detection, and unraveling of gene interactions and pleiotropic effects based on the multi-environment, multi-trait phenotyping of the Oregon Promise population. This project is possible thanks to the tools and knowledge generated by the USDA-NIFA T-CAP project.Peer Reviewe

    Effect of the mutation rate and background size on the quality of pathogen identification

    No full text
    Motivation: Genomic-based methods have significant potential for fast and accurate identification of organisms or even genes of interest in complex environmental samples (air, water, soil, food, etc.), especially when isolation of the target organism cannot be performed by a variety of reasons. Despite this potential, the presence of the unknown, variable and usually large quantities of background DNA can cause interference resulting in false positive outcomes. Results: In order to estimate how the genomic diversity of the background (total length of all of the different genomes present in the background), target length and target mutation rate affect the probability of misidentifications, we introduce a mathematical definition for the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform the signature into the closest subsequence present in the background. This definition, in conjunction with a probabilistic framework, allows one to predict the minimal signature length required to identify the target in the presence of different sizes of backgrounds and the effect of the target's mutation rate on the quality of its identification. The model assumptions and predictions were validated using both Monte Carlo simulations and real genomic data examples. The proposed model can be used to determine appropriate signature lengths for various combinations of target and background genome sizes. It also predicted that any genomic signatures will be unable to identify target if its mutation rate is > 5%. © The Author 2007. Published by Oxford University Press. All rights reserved
    corecore